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Abstract — Interior-point methods are state-of-the-art al- 
gorithms for solving linear programming (LP) problems 
with polynomial complexity. Specifically, the Karmarkar 
algorithm typically solves LP problems in time 0{n^'^), 
where n is the number of unknown variables. Karmarkar's 
celebrated algorithm is known to be an instance of the 
log-barrier method using the Newton iteration. The main 
computational overhead of this method is in inverting 
the Hessian matrix of the Newton iteration. In this 
contribution, we propose the application of the Gaussian 
beUef propagation (GaBP) algorithm as part of an efficient 
and distributed LP solver that exploits the sparse and 
symmetric structure of the Hessian matrix and avoids the 
need for direct matrix inversion. This approach shifts the 
computation from realm of linear algebra to that of proba- 
biUstic inference on graphical models, thus applying GaBP 
as an efficient inference engine. Our construction is general 
and can be used for any interior-point algorithm which 
uses the Newton method, including non-linear program 
solvers. 

I. Introduction 

In recent years, considerable attention has been 
dedicated to the relation between belief propagation 
message passing and linear programming schemes. 
This relation is natural since the maximum a- 
posteriori (MAP) inference problem can be trans- 
lated into integer linear programming (ILP) [1]. 

Weiss et al. [1] approximate the solution to the 
ILP problem by relaxing it to a LP problem using 
convex variational methods. In [2], tree-reweighted 
belief propagation (BP) is used to find the global 
minimum of a convex approximation to the free 
energy. Both of these works apply discrete forms 



of BP. Globerson et al. [3], [4] assume convexity 
of the problem and modify the BP update rules 
using dual-coordinate ascent algorithm. Hazan et 
al. [5] describe an algorithm for solving a general 
convex free energy minimization. In both cases the 
algorithm is guaranteed to converge to the global 
minimum as the problem is tailored to be convex. 

In the present work we take a different path. Un- 
like most of the previous work which uses gradient- 
descent methods, we show how to use interior-point 
methods which are shown to have strong advantages 
over gradient and steepest descent methods. (For a 
comparative study see [6, §9.5,p. 496].) The main 
benefit of using interior point methods is their 
rapid convergence, which is quadratic once we are 
close enough to the optimal solution. Their main 
drawback is that they require heavier computational 
effort for forming and inverting the Hessian ma- 
trix, needed for computing the Newton step. To 
overcome this, we propose the use of Gaussian BP 
(GaBP) [7], [8], which is a variant of BP applicable 
when the underlying distribution is Gaussian. Using 
GaBP, we are able to reduce the time associated 
with the Hessian inversion task, from 0(n^-^) to 
0(nplog(e)/log(7)) at the worst case, where p < n 
is the size of the constraint matrix A, e is the 
desired accuracy, and l/2<7<lisa parameter 
characterizing the matrix A. This computational 
saving is accomplished by exploiting the sparsity 
of the Hessian matrix. 

An additional benefit of our GaBP-based ap- 
proach is that the polynomial-complexity LP solver 



can be implemented in a distributed manner, en- 
abling efficient solution of large-scale problems. 

We also provide what we believe is the first 
theoretical analysis of the convergence speed of the 
GaBP algorithm. 

The paper is organized as follows. In Section 
ini we reduce standard linear programming to a 
least-squares problem. Section UlI] shows how to 
solve the least-squares problem using the GaBP 
algorithm. In Section |lVl we extend our construction 
to the primal-dual method. We give our convergence 
results for the GaBP algorithm in Section |Vl and 
demonstrate our construction in Section |VI] using 
an elementary example. We present our conclusions 
in Section jVlll 

II. Standard Linear Programming 
Consider the standard linear program 



minimize^ 



T 
C X 



subject to Ax = b, x > 



(la) 
(lb) 



where A G with rank{A} = p < n. We 

assume the problem is solvable with an optimal 
X* assignment. We also assume that the problem 
is strictly feasible, or in other words there exists 
X G M" that satisfies Ax = b and x > 0. 

Using the log-barrier method [6, §11.2], one gets 



mimmizCx,^ 
subject to 



T 
C X 



-/iS^^ilogXfc (2a) 
Ax = b. (2b) 



This is an approximation to the original problem 
(fTal ). The quality of the approximation improves as 
the parameter /i ^ 0. 

Now we would like to use the Newton method 
in for solving the log-barrier constrained objective 
function (|2al) . described in Table HI Suppose that we 
have an initial feasible point xq for the canonical 
linear program (fTal) . We approximate the objective 
function (|2a1 ) around the current point x using a 
second-order Taylor expansion 

/(i + Ax) ~ /(i) + /'(x) Ax + l/2Ax^/"(x) Ax. 

(3) 

Finding the optimal search direction Ax yields the 
computation of the gradient and compare it to zero 

df 



9Ax 



/'(i) + r(5c)Ax = 0, 



(4) 



Ax=-r(i)-7'(i 



(5) 



Denoting the current point x = (x, /x, y) and 
the Newton step Ax = (x, y,/i), we compute the 
gradient 

/'(x, yu, y) = (5/(x, fi, y)/(9x, (9/(x, fi, y)/dfi, 

,<9/(x,;U,y)/9y) 

The Lagrangian is 

£(x,/i,y) = c^x-/iEfclogXfe + y^(b- Ax), (7) 



<9>C(x,/i,y) 



9x 



c-/iX-^l-y^A = 0, (8) 



a2£(x,/i,y) 2 



dx. 



(9) 



where X = diag(x) and 1 is the all-one column 
vector. Substituting ([S])-® into (g]), we get 



c-/iX"il-y^A + /iX"^x = 0, (10) 



-2, 



c - fiX-H + x/iX-2 = y^A, 



dC{x,fi,y) 
dy 



Ax = 0. 



(11) 



(12) 



Now multiplying (fTTj) by AX^, and using (fT2l) to 
eliminate x we get 



AX^AV = AX^c - ^AXl. (13) 

These normal equations can be recognized as gen- 
erated from the linear least-squares problem 



min I |XA^y - Xc - /iAXl||^. 



(14) 



Solving for y we can compute the Newton direction 
X, taking a step towards the boundary and compose 
one iteration of the Newton algorithm. Next, we will 
explain how to shift the deterministic LP problem to 
the probabilistic domain and solve it distributively 
using GaBP. 



TABLE I 

The Newton algorithm [6, §9.5.2] . 

Given feasible starting point xq and tolerance e > 0, k = 1 
Repeat 1 Compute the Newton step and decrement 
Ax = r(x)-if(x), A2 = f(xfAx 

2 Stopping criterion, quit if < e 

3 Line search. Choose step size t by backtracking line search. 

4 Update, x^ := xa;_i + tAx, k = k + 1 



III. From LP to Probabilistic Inference 

We start from the least-squares problem (fT4l) . 
changing notations to 

min||Fy-g||^, (15) 
y 

where F = XA^, g = Xc+/iAXl. Now we define 
a multivariate Gaussian 

p(x) ^ p(x, y) oc exp(-l/2(Fy - g)^I(Fy - g)). 

(16) 

It is clear that y, the minimizing solution of (fT5l) . 
is the MAP estimator of the conditional probability 

y = argmaxp(y|x) = 
y 

= 7Vr((F^F)-^F^g, (F^F)-^). (17) 

Recent results by Bickson and Shental et al. [7]- 
[9] show that the pseudoinverse problem (fTTI) can 
be computed efficiently and distributively by using 
the GaBP algorithm. 

The formulation (fT6l ) allows us to shift the least- 
squares problem from an algebraic to a probabilistic 
domain. Instead of solving a deterministic vector- 
matrix linear equation, we now solve an inference 
problem in a graphical model describing a certain 
Gaussian distribution function. Following [9] we 
define the joint covariance matrix 

C4(;IF) (18) 

and the shift vector b = {0^, g^}^ G M(p+")>^i. 

Given the covariance matrix C and the shift 
vector b, one can write explicitly the Gaussian 
density function, p(x) , and its corresponding graph 
Q with edge potentials ('compatibility functions') 



^pij and self-potentials ('evidence') 0j. These graph 
potentials are determined according to the follow- 
ing pairwise factorization of the Gaussian distribu- 
tion ]9(x) OC lll=iM^i) resulting 
in ipij{xi,Xj) = exp(— XjCjjXj), and (f)i{xi) = 
exp (biXi — Ciix1/2). The set of edges {i, j} cor- 
responds to the set of non-zero entries in C (fT8l) . 
Hence, we would like to calculate the marginal 
densities, which must also be Gaussian, 

p(x.) r.N{^^. = {C-'gh, Pr' = 

\/i > p, 

where fii and Pi are the marginal mean and inverse 
variance (a.k.a. precision), respectively. Recall that, 
according to [9], the inferred mean /i^ is identical 
to the desired solution y of (fTTI) . The GaBP update 
rules are summarized in Table HIl 

It is known that if GaBP converges, it results in 
exact inference [10]. However, in contrast to con- 
ventional iterative methods for the solution of sys- 
tems of linear equations, for GaBP, determining the 
exact region of convergence and convergence rate 
remain open research problems. All that is known is 
a sufficient (but not necessary) condition [11], [12] 
stating that GaBP converges when the spectral ra- 
dius satisfies p(|Ix — A|) < 1. A stricter sufficient 
condition [10], determines that the matrix A must 
be diagonally dominant (i.e., |ajj| > kijl)^^) 
in order for GaBP to converge. Convergence speed 
is discussed in Section |Vl 

IV. Extending the Construction to the 
Primal-Dual Method 

In the previous section we have shown how to 
compute one iteration of the Newton method using 
GaBP. In this section we extend the technique for 



TABLE II 

Computing x = A^^b via GaBP [7]. 



# 


Stage 


Operation 


1. 


Initialize 


Compute Pa = An and fin = hi/ An. 
oet = U and /ifcj = U, vk ^ t. 


z. 


Iterate 


Propagate P^-i and /i^j, VA; 7^ i such that ^ 0. 

Compute Pi\j = Pa + EfceN(i)\i /^Ai = P^^j{P^^^^ii + EfcgN(i)\i ^fc^/^fci)- 
Compute = -AjjP.^^Mji and = -P[^AijHi\j. 


3. 


Check 


If Pij and yUjj did not converge, return to #2. Else, continue to #4. 


4. 


Infer 


Pi ~ Pa + X]fceN(i) -^fc* ^ f'i ~ Pi {Piif'ii + XlA:eN(i) Pkif^ki) ■ 


5. 


Output 





computing the primal-dual method. This construc- 
tion is attractive, since the extended technique has 
the same computation overhead. 

The dual problem ( [13]) conforming to (fTal) can 
be computed using the Lagrangian 

£(x, y, z) = c^x + y^(b - Ax) - z^x, z > 0, 



c/(y,z) = inf /:(x,y,z), (19a) 

X 

subject to Ax = b, X > 0. (19b) 



while 



g/:(x,y,z) 



c - A^y - z = 0. (20) 



Substituting ^ into (fT9al) we get 

maximizCy b^y 
subject to A^y + z = c, z > 0. 

Primal optimality is obtained using ([8]) [13] 

y^A = c-/xX-4. (22) 

Substituting (|22l) in (I21al) we get the connection 
between the primal and dual 

= z. 

In total, we have a primal-dual system (again we 
assume that the solution is strictly feasible, namely 
X > 0,z > 0) 



Ax = b, X > 0, 



A y + z = c, z > 0, 
Xz = fil. 



The solution [x(/i), y(/i), z(/i)] of these equations 
constitutes the central path of solutions to the log- 
arithmic barrier method [6, 11.2.2]. Applying the 
Newton method to this system of equations we get 






A^ 




/ Ax 


A 





i) 


Ay 


Z 







\ Az 



b- Ax 
- A^y- 
fil-Xz 



(24) 



The solution can be computed explicitly by 

Ay = CAZ-i^A^^^-i 



Ax 
Az 



'AZ-^XA' , 

'AZ-iX(c-/iX-il- A- 
-i(A^Ay + /iX-4 



XZ 



-A^Ay + c - A'V - z. 



y)-Fb- Ax), 
= c + A^y), 



The main computational overhead in this method 
is the computation of (AZ~^XA-^)~^, which is 
derived from the Newton step in ([5]). 

Now we would like to use GaBP for computing 
the solution. We make the following simple change 
to (l24l) to make it symmetric: since z > 0, we can 
multiply the third row by Z^ and get a modified 
symmetric system 

A^ / \ / Ax \ / b - Ax 
AO Ay = c-A^y-z 

/ Z-^X J \ Az J \ fiZ-^l-X 

/ A^ / \ 
Defining A = A , and b = 

\ I z-^x / 

b- Ax \ 
c — A^y — z . one can use GaBP iterative 
/iZ-^l-X J 
algorithm shown in Table HIl 



In general, by looking at © we see that the 
solution of each Newton step involves inverting the 
Hessian matrix /"(x). The state-of-the-art approach 
in practical implementations of the Newton step 
is first computing the Hessian inverse /"(x)^^ by 
using a (sparse) decomposition method like (sparse) 
Cholesky decomposition, and then multiplying the 
result by /'(x). In our approach, the GaBP al- 
gorithm computes directly the result Ax, without 
computing the full matrix inverse. Furthermore, if 
the GaBP algorithm converges, the computation of 
Ax is guaranteed to be accurate. 

V. New Convergence Results 

In this section we give an upper bound on the 
convergence rate of the GaBP algorithm. As far as 
we know this is the first theoretical result bounding 
the convergence speed of the GaBP algorithm. 

Our upper bound is based on the work of Weiss 
et al. [10, Claim 4], which proves the correctness 
of the mean computation. Weiss uses the pairwise 
potentials forrrQ, where 

p(x) OC Ilijtpij{Xi,Xj)Ili1pi{Xi), 

iljij{xi,Xj) = exp{-l/2{xi Xj)^\'ij{xi Xj)), 



dominant, we define e-i to be the non negative gap 



v. 



bji Cij 



Assuming the optimal solution is x*, for a desired 
accuracy e| |b| \^ where | |b| |oo = maxj |bj|, and b is 
the shift vector, we need to run the algorithm for at 
most t = [log(e)/log(/3)] rounds to get an accuracy 
of \x* — Xt\ < e||b||oo where j3 = maxj^ \bij/cij\. 

The problem with applying Weiss' result directly 
to our model is that we are working with different 
parameterizations. We use the information form 
p(x) OC exp(— l/2x^Ax+b-^x). The decomposition 
of the matrix A into pairwise potentials is not 
unique. In order to use Weiss' result, we propose 
such a decomposition. Any decomposition from 
the canonical form to the pairwise potentials form 
should be subject to the following constraints [10] 



A 



TjjCij 



A-ii . 



We propose to initialize the pairwise potentials as 
following. Assuming the matrix A is diagonally 

'Weiss assumes scalar variables with zero means. 



Si 



A,; 



> 0. 



S j I Ajj I 

and the following decomposition 

bij = Ajj, Cij = Ajj + Si/\N{i)\, 

where |A^(«)| is the number of graph neighbors of 
node i. Following Weiss, we define 7 to be 



7 



\b 

max 

i,i |c. 



«J I 









1 ^ij 


\+eJ\N{i)\ 



max — 

i,j 1 



< 1. 



(25) 



(5.)/(|a,,||iVW|) 

In total, we get that for a desired accuracy of e| |b| |oo 
we need to iterate for t = [log(e)/log(7)] rounds. 
Note that this is an upper bound and in practice 
we indeed have observed a much faster convergence 
rate. 

The computation of the parameter 7 can be easily 
done in a distributed manner: Each node locally 
computes Ei, and % = maxj 1/(1 + \aij\ei/N{i)). 
Finally, one maximum operation is performed glob- 
ally, 7 = maxi7i. 

A. Applications to Interior-Point Methods 

We would like to compare the running time of 
our proposed method to the Newton interior-point 
method, utilizing our new convergence results of 
the previous section. As a reference we take the 
Karmarkar algorithm [14] which is known to be 
an instance of the Newton method [15]. Its running 
time is composed of n rounds, where on each round 
one Newton step is computed. The cost of comput- 
ing one Newton step on a dense Hessian matrix is 
0{n^'^), so the total running time is 0{n^-^). 

Using our approach, the total number of Newton 
iterations, n, remains the same as in the Karmarkar 
algorithm. However, we exploit the special structure 
of the Hessian matrix, which is both symmetric 
and sparse. Assuming that the size of the constraint 
matrix A is n x p, p < n, each iteration of 
GaBP for computing a single Newton step takes 
0{np), and based on our new convergence analysis 
for accuracy e||b||oo we need to iterate for r = 
[log(e) /log(7)] rounds, where 7 is defined in (|25l) . 
The total computational burden for a single Newton 
step is 0(r2j9log(e)/log(7)). There are at most n 
rounds, hence in total we get 0(n^plog(e)/log(7)). 




0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 



Fig. 1. A simple example of using GaBP for solving linear 
programming with two variables and eleven constraints. Each red 
circle shows one iteration of the Newton method. 



VI. Experimental Results 

We demonstrate the applicability of the proposed 
algorithm using the following simple linear program 
borrowed from [16] 

maximize xi + X2 

subject to 2pxi + X2 < p'^ + I , 
p = 0.0,0.1,- ■■ ,1.0 . 

Fig. \T\ shows execution of the affine-scaling al- 
gorithm [17], a variant of Karmarkar's algorithm 
[14], on a small problem with two variables and 
eleven constraints. Each circle is one Newton step. 
The inverted Hessian is computed using the GaBP 
algorithm, using two computing nodes. Matlab code 
for this example can be downloaded from [18]. 

Regarding larger scale problems, we have ob- 
served rapid convergence (of a single Newton step 
computation) on very large scale problems. For 
example, [19] demonstrates convergence of 5-10 
rounds on sparse constraint matrices with several 
millions of variables. [20] shows convergence of 
dense constraint matrices of size up to 150, 000 x 
150, 000 in 6 rounds, where the algorithm is run in 
parallel using 1,024 CPUs. Empirical comparison 
with other iterative algorithms is given in [8]. 

VII. Conclusion 

In this paper we have shown how to efficiently 
and distributively solve interior-point methods using 



an iterative algorithm, the Gaussian belief propaga- 
tion algorithm. Unlike previous approaches which 
use discrete belief propagation and gradient descent 
methods, we take a different path by using con- 
tinuous belief propagation applied to interior-point 
methods. By shifting the Hessian matrix inverse 
computation required by the Newton method, from 
linear algebra domain to the probabilistic domain, 
we gain a significant speedup in performance of 
the Newton method. We believe there are numerous 
applications that can benefit from our new approach. 

Acknowledgement 

O. Shental acknowledges the partial support of 
the NSF (Grant CCF-05 14859). D. Bickson would 
like to thank Nati Linial from the Hebrew University 
of Jerusalem for proposing this research direction. 
The authors are grateful to Jack Wolf and Paul 
Siegel from UCSD for useful discussions and for 
constructive comments on the manuscript. 

References 

[1] C. Yanover, T. Meltzer, and Y. Weiss, "Linear programming 
relaxations and belief propagation - an empirical study," in 
Journal of Machine Learning Research, vol. 7. Cambridge, 
MA, USA: MIT Press, 2006, pp. 1887-1907. 

[2] Y. Weiss, C. Yanover, and T. Meltzer, "Map estimation, linear 
programming and belief propagation with convex free energies," 
in The 23th Conference on Uncertainty in Artificial Intelligence 
(UAl), 2007. 

[3] A. Globerson and T. Jaakkola, "Fixing max-product: Conver- 
gent message passing algorithms for map Ip-relaxations," in 
Advances in Neural Information Processing Systems (NIPS), 
no. 21, Vancouver, Canada, 2007. 

[4] M. Collins, A. Globerson, T. Koo, X. Carreras, and P. Bartlett, 
"Exponentiated gradient algorithms for conditional random 
fields and max-margin markov networks," in Journal of Ma- 
chine Learning Research. Accepted for publication, 2008. 

[5] T. Hazan and A. Shashua, "Convergent message-passing al- 
gorithms for inference over general graphs with convex free 
energy," in The 24th Conference on Uncertainty in Artificial 
Intelligence (UAI), Helsinki, July 2008. 

[6] S. Boyd and L. Vandenberghe, Convex Optimization. Cam- 
bridge University Press, March 2004. 

[7] O. Shental, D. Bickson, P H. Siegel, J. K. Wolf, and D. Dolev, 
"Gaussian belief propagation solver for systems of linear equa- 
tions," in IEEE Int. Symp. on Inform. Theory (ISIT), Toronto, 
Canada, July 2008. 

[8] D. Bickson, O. Shental, P H. Siegel, J. K. Wolf, and D. Dolev, 
"Linear detection via belief propagation," in Proc. 45th Allerton 
Conf. on Communications, Control and Computing, Monticello, 
IL, USA, Sept. 2007. 

[9] , "Gaussian belief propagation based multiuser detection," 

in IEEE Int. Symp. on Inform. Theory (ISIT), Toronto, Canada, 
July 2008. 



[10] Y. Weiss and W. T. Freeman, "Correctness of belief propagation 
in Gaussian graphical models of arbiti'ary topology," Neural 
Computation, vol. 13, no. 10, pp. 2173-2200, 2001. 

[11] J. K. Johnson, D. M. Mahoutov, and A. S. Willsky, "Walk- 
sum interpretation and analysis of Gaussian belief propagation," 
in Advances in Neural Information Processing Systems 18, 
Y. Weiss, B. Scholkopf, and J. Piatt, Eds. Cambridge, MA: 
MIT Press, 2006, pp. 579-586. 

[12] D. M. Malioutov, J. K. Johnson, and A. S. Willsky, "Walk-sums 
and belief propagation in Gaussian graphical models," Journal 
of Machine Learning Research, vol. 7, Oct. 2006. 

[13] S. Portnoy and R. Koenker, "The gaussian hare and the laplacian 
tortoise: Computability of squared- error versus absolute-error 
estimators," in Statistical Science, vol. 12, no. 4. Institute of 
Mathematical Statistics, 1997, pp. 279-296. 

[14] N. Karmarkar, "A new polynomial-time algorithm for linear 
programming," in STOC '84: Proceedings of the sixteenth 
annual ACM symposium on Theory of computing. New York, 
NY, USA: ACM, 1984, pp. 302-311. 

[15] D. A. Bayer and J. C. Lagarias, "Karmarkar's Unear pro- 
gramming algorithm and newton's method," in Mathematical 
Programming, vol. 50, no. 1, March 1991, pp. 291-330. 

[16] http://en.wikipedia.org/ wiki /Karmarkar ' s_algorithm. 

[17] R. J. Vanderbei, M. S. Meketon, and B. A. Freedman, "A 
modification of karmarkar's linear programming algorithm," in 
Algorithmica, vol. 1, no. 1, March 1986, pp. 395^07. 

[18] http : / /www. cs . hu ji . ac . il/labs/danss/p2p/ gabp/. 

[19] D. Bickson and D. Malkhi, "A unifying framework for rating 
users and data items in peer-to-peer and social networks," 
in Peer-to-Peer Networking and Applications (PPNA) Journal, 
Springer-Verlag, April 2008. 

[20] D. Bickson, D. Dolev, and E. Yom-Tov, "A gaussian behef 
propagation solver for large scale support vector machines," 
in 5th European Conference on Complex Systems, Jerusalem, 
Sept. 2008. 



